{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ncjQuwOgT4W3"
      },
      "source": [
        "# 🗣️ Speech-AI-Forge Colab\n",
        "\n",
        "👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~\n",
        "\n",
        "## 运行指南\n",
        "\n",
        "1. 在菜单栏中选择 **代码执行程序**。\n",
        "2. 点击 **全部运行**。\n",
        "\n",
        "运行完成后，请在下方日志中找到如下信息：\n",
        "\n",
        "```\n",
        "Running on public URL: https://**.gradio.live\n",
        "```\n",
        "\n",
        "该链接即为您可以访问的公网地址。\n",
        "\n",
        "> 注意：如果在安装包时提示需要重启，请选择 \"否\"。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fvzfP6Tn8nOL"
      },
      "source": [
        "## 环境部署"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "09UPJy1gP045",
        "outputId": "079a9d2c-54aa-4ff5-a969-70e27029f3ed"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "# Skip restarting message in Colab\n",
        "import sys; modules = list(sys.modules.keys())\n",
        "for x in modules: sys.modules.pop(x) if \"PIL\" in x or \"google\" in x else None\n",
        "\n",
        "# 1. Clone the repository\n",
        "!git clone https://github.com/lenML/Speech-AI-Forge\n",
        "\n",
        "# 2. Change directory to the repository\n",
        "%cd Speech-AI-Forge\n",
        "\n",
        "# 3. Install ffmpeg and rubberband-cli\n",
        "!apt-get update -y\n",
        "!apt-get install -y ffmpeg rubberband-cli\n",
        "\n",
        "# 4. Install dependencies\n",
        "!pip install -r requirements.txt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hLupWhOXT1kt"
      },
      "source": [
        "## 下载模型"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gZBGws7c8nON",
        "outputId": "73de300c-da8e-4fd5-ea37-73cfecd2f2f7"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "from scripts import dl_chattts, dl_enhance\n",
        "from scripts.downloader import fish_speech_1_4, cosyvoice2, fire_red_tts, f5_tts, vocos_mel_24khz, faster_whisper, open_voice\n",
        "\n",
        "# @markdown ## 模型下载说明\n",
        "# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>\n",
        "# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 ChatTTS。\n",
        "\n",
        "# @markdown ### Hugging Face Token (可选)\n",
        "# @markdown 部分模型可能需要配置 Hugging Face Token 才能下载。 如果您需要使用这些模型,请在此处输入您的 Token。您可以从 [Hugging Face](https://huggingface.co/settings/tokens) 获取您的 Token.\n",
        "HF_TOKEN = \"\" #@param {type:\"string\"}\n",
        "\n",
        "if HF_TOKEN:\n",
        "  os.environ[\"HF_TOKEN\"] = HF_TOKEN # 设置 \"HF_TOKEN\" 环境变量\n",
        "  print(\"HF_TOKEN 环境变量已配置。\")  # Helpful feedback\n",
        "else:\n",
        "  print(\"未提供 HF_TOKEN, 跳过环境变量配置.\")  # Good to know why\n",
        "\n",
        "# @markdown ### TTS 模型\n",
        "# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.\n",
        "download_chattts = True  # @param {\"type\":\"boolean\", \"placeholder\":\"下载 ChatTTS 模型\"}\n",
        "# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching\n",
        "download_f5_tts = False  # @param {\"type\":\"boolean\"}\n",
        "# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.\n",
        "download_cosyvoice = False  # @param {\"type\":\"boolean\"}\n",
        "# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System\n",
        "download_fire_red_tts = False  # @param {\"type\":\"boolean\"}\n",
        "# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution\n",
        "download_fish_speech = False  # @param {\"type\":\"boolean\"}\n",
        "\n",
        "# @markdown ### ASR 模型\n",
        "# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision\n",
        "download_whisper = True  # @param {\"type\":\"boolean\"}\n",
        "\n",
        "# @markdown ### Clone Voice 模型\n",
        "# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.\n",
        "download_open_voice = False  # @param {\"type\":\"boolean\"}\n",
        "\n",
        "# @markdown ### 增强模型\n",
        "# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement\n",
        "download_enhancer = True  # @param {\"type\":\"boolean\"}\n",
        "\n",
        "\n",
        "# 检查是否至少选择了一个 TTS 模型\n",
        "if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):\n",
        "    print(\"未选择任何 TTS 模型，默认下载 ChatTTS...\")\n",
        "    download_chattts = True\n",
        "\n",
        "dl_source=\"huggingface\"\n",
        "\n",
        "# TTS 模型下载\n",
        "if download_chattts:\n",
        "    print(\"下载 ChatTTS...\")\n",
        "    dl_chattts.ChatTTSDownloader()(source=dl_source)\n",
        "    print(\"下载 ChatTTS, 完成\")\n",
        "\n",
        "if download_fish_speech:\n",
        "    print(\"下载 FishSpeech...\")\n",
        "    fish_speech_1_4.FishSpeech14Downloader()(source=dl_source)\n",
        "    print(\"下载 FishSpeech, 完成\")\n",
        "\n",
        "if download_cosyvoice:\n",
        "    print(\"下载 CosyVoice...\")\n",
        "    cosyvoice2.CosyVoice2Downloader()(source=dl_source)\n",
        "    print(\"下载 CosyVoice, 完成\")\n",
        "\n",
        "if download_fire_red_tts:\n",
        "    print(\"下载 FireRedTTS...\")\n",
        "    fire_red_tts.FireRedTTSDownloader()(source=dl_source)\n",
        "    print(\"下载 FireRedTTS, 完成\")\n",
        "\n",
        "if download_f5_tts:\n",
        "    print(\"下载 F5TTS...\")\n",
        "    f5_tts.F5TTSDownloader()(source=dl_source)\n",
        "    vocos_mel_24khz.VocosMel24khzDownloader()(source=dl_source)\n",
        "    print(\"下载 F5TTS, 完成\")\n",
        "\n",
        "# ASR 模型下载\n",
        "if download_whisper:\n",
        "    print(\"下载 Whisper...\")\n",
        "    faster_whisper.FasterWhisperDownloader()(source=dl_source)\n",
        "    print(\"下载 Whisper, 完成\")\n",
        "\n",
        "# Clone Voice 模型下载\n",
        "if download_open_voice:\n",
        "    print(\"下载 OpenVoice...\")\n",
        "    open_voice.OpenVoiceDownloader()(source=dl_source)\n",
        "    print(\"下载 OpenVoice, 完成\")\n",
        "\n",
        "# 增强模型下载\n",
        "if download_enhancer:\n",
        "    print(\"下载 ResembleEnhance...\")\n",
        "    dl_enhance.ResembleEnhanceDownloader()(source=dl_source)\n",
        "    print(\"下载 ResembleEnhance, 完成\")\n",
        "\n",
        "print(\"所有选定的模型已下载完成！\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RtpfdFem8nOO"
      },
      "source": [
        "## 运行 WebUI"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "hi6eW_KLDc_p",
        "outputId": "5de77d0c-a3ee-4b86-87a8-490fdcdda794"
      },
      "outputs": [],
      "source": [
        "!nvcc --version"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "J5CkIs0nFZKi",
        "outputId": "b21554f2-e82b-422e-dc82-dfc675bc6c49"
      },
      "outputs": [],
      "source": [
        "!nvidia-smi"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "X8x_dWk88nOO",
        "outputId": "9029743a-d214-4385-a241-ef5de1f2c110"
      },
      "outputs": [],
      "source": [
        "!python webui.py --share --language=zh-CN"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
